Skip to content

[deployer] Bound SSH child execution#3714

Draft
BrendanChou wants to merge 2 commits into
mainfrom
bc/deployer-command-timeouts
Draft

[deployer] Bound SSH child execution#3714
BrendanChou wants to merge 2 commits into
mainfrom
bc/deployer-command-timeouts

Conversation

@BrendanChou
Copy link
Copy Markdown
Collaborator

@BrendanChou BrendanChou commented May 4, 2026

Sometimes deployer will hang while making sure instances are healthy. After a few minutes I am actually able to connect to the monitoring instance and see all of the instances are running. So the issue seems to be with the script hanging rather than the instances being legitimately unhealthy.

It seems that maybe the SSHing used to verify the instance health is not healthy for some reason (?). So instead we should have some sort of timeout and try again.

Codex vibed this fix.

Summary

  • Add per-child timeouts for SSH/SCP execution so wedged children cannot hang deployer operations forever.
  • Keep service polling bounded with short per-poll timeouts while preserving longer timeouts for installs, downloads, and profiling.
  • Add a timeout-specific deployer error for clearer failures.

@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented May 4, 2026

Deploying monorepo with  Cloudflare Pages  Cloudflare Pages

Latest commit: fc3a6d0
Status: ✅  Deploy successful!
Preview URL: https://f2699699.monorepo-eu0.pages.dev
Branch Preview URL: https://bc-deployer-command-timeouts.monorepo-eu0.pages.dev

View logs

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

Benchmark results

Tip

PASSED: No benchmark exceeded the regression threshold.

Benchmark comparison table
Benchmark Baseline (main) Current Delta Threshold Status
qmdb::merkleize/variant=any::unordered::fixed::mmr keys=10000 ch=false sync=false 1.564 ms 1.708 ms +9.22% 10.00% ✅ PASS

Baseline commit(s): de6ecda806f3

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit b2c90ca. Configure here.

Comment thread deployer/src/aws/utils.rs Outdated
.output()
.await?;
.arg(command);
let output = command_output(cmd, "ssh", ip, command_timeout).await?;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timeout errors bypass retry loop in SSH execute

High Severity

In ssh_execute_with_timeout, command_output is called with ?, so a CommandTimeout error immediately exits the retry loop. In contrast, poll_service_active, poll_service_inactive, and scp_download all explicitly catch CommandTimeout with a match, log a warning, and continue the loop. This means a single hung SSH connection during deployment commands (apt install, binary download, service stop) fails the entire operation instead of retrying, defeating the purpose of the retry loop.

Additional Locations (2)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit b2c90ca. Configure here.

@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented May 4, 2026

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Updated (UTC)
✅ Deployment successful!
View logs
commonware-mcp fc3a6d0 May 04 2026, 10:42 PM

@BrendanChou BrendanChou marked this pull request as draft May 4, 2026 22:31
@codecov
Copy link
Copy Markdown

codecov Bot commented May 4, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.82%. Comparing base (de6ecda) to head (fc3a6d0).
⚠️ Report is 1 commits behind head on main.

@@            Coverage Diff             @@
##             main    #3714      +/-   ##
==========================================
- Coverage   95.83%   95.82%   -0.01%     
==========================================
  Files         458      458              
  Lines      180036   180036              
  Branches     4208     4208              
==========================================
- Hits       172531   172525       -6     
- Misses       6166     6169       +3     
- Partials     1339     1342       +3     

see 2 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update de6ecda...fc3a6d0. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant